Screening and verification of hub genes in esophageal squamous cell carcinoma by integrated analysis

Esophageal squamous cell carcinoma (ESCC) is one of the most common malignant tumors. However, the mechanisms underlying ESCC tumorigenesis have not been fully elucidated. Thus, we aimed to determine the key genes involved in ESCC tumorigenesis. The following bioinformatics analyses were performed: identification of differentially expressed genes (DEGs); gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis; integrated analysis of the protein–protein interaction network and Gene Expression Profiling Interactive Analysis database for validation of hub genes. Finally, western blotting and qPCR were used to explore the expression of cell division cycle 6 (CDC6) in ESCC cell lines. Immunohistochemistry analysis of ESCC samples from patients and matched clinical characteristics was used to determine the effects of CDC6. A total of 494 DEGs were identified, and functional enrichment was mainly focused on cell cycle and DNA replication. Biological pathway analysis of the hub genes was closely related to the cell cycle. We found that CDC6 was upregulated in ESCC cell lines and patient tissues and was related to the clinicopathological characteristics of ESCC. In conclusion, this study identified hub genes and crucial biological pathways related to ESCC tumorigenesis and integrated analyses indicated that CDC6 may be a novel diagnostic and therapeutic target for ESCC.


Protein-protein interaction (PPI) network and module analysis
The PPI network of the DEGs was constructed using the STRING database and Cytoscape software.The top three significant modules (Supplementary Material 3) were selected using the MCODE plug-in in Cytoscape (Module 1: MCODE score = 66.056;Module 2: MCODE score = 21.758;Module 3: MCODE score = 8.621) (Fig. 2A).We then analyzed the functions of these modules.The enriched KEGG pathway showed that Module 1 consisted of 73 nodes and 2378 edges mainly enriched in the cell cycle, DNA replication, and oocyte meiosis.Module 2, containing 34 nodes and 359 edges, was primarily correlated with protein digestion and absorption by the ECM receptor.Module 3, which included 30 nodes and 125 edges, primarily correlated with cytokine-cytokine receptor interactions, chemokine signaling pathways, and IL-17 signaling pathways [13][14][15][16] (Fig. 2B).

Expression level of hub genes
Based on the degree of connectivity, the top 10 significant hub genes were filtered (Table 2).We created a heatmap of the 10 most essential hub genes, based on four data series (GSE23400, GSE38129, GSE20347, and GSE29001) and found that the hub genes could distinguish ESCC samples from normal samples (Fig. 3A).The PPI network of the 10 hub genes was predicted using the STRING database and constructed using Cytoscape (Fig. 3B).The biological pathways for the top 10 hub genes were cell cycle, mitosis, cell cycle checkpoints, G2/M checkpoints, G0

Genes name
and Early G1, DNA replication, and Mitotic G1-G1-/S phases (Fig. 3C).Furthermore, we constructed a network of hub and co-expressed genes (Fig. 3D) and performed gene co-expression analysis of the hub genes using the STRING database and FunRich software (version 3.1.3;http:// www.funri ch.org/), which indicated that the hub genes may strongly interact with each other (Fig. 3E).

Expression verification of hub genes and genetic alteration
To investigate the transcriptional expression difference of the hub genes between tumor and normal tissues, we analyzed the datasets from the Gene Expression Profiling Interactive Analysis (GEPIA) database using one-way analysis of variance (ANOVA) (cut-off criteria: |log2FC|= 1 and p = 0.01).As shown in Fig. 4A, compared with normal tissues, the genes were significantly upregulated in tumor tissues.We then investigated the genetic alterations in the hub genes, and found that RFC4 and CDC6 had the top two frequencies of alteration (35% and 4%, respectively), including missense mutations, truncation mutations, amplifications, and deep deletions (Fig. 4B).These results suggest that RFC4 and CDC6 may play roles in the development and progression of ESCC and remained of great significance to further research.

Receiver operating characteristic (ROC) analysis of hub genes and drug-hub gene interactions
ROC curve analysis was performed to assess the diagnostic value of the top ten hub genes for ESCC (Fig. 5A).The expression levels of hub genes were extracted from four databases (GSE23400, GSE38129, GSE20347, and GSE29001).The area under the curve (AUC) of these genes ranged from 0.8353 to 1.000.Furthermore, we identified promising drugs associated with these hub genes.By searching druggable gene categories in DGIdb, we found that most hub genes matched with tumor suppressor genes (Table 3).Finally, 35 Food and Drug Administration (FDA) approved drugs (Table 4) targeting CDK1, TOP2A, AURKA, and PCNA were identified, with a large majority of the drugs inhibiting TOP2A.Furthermore, a downstream network of the 10 hub genes was constructed using the STITCH database (Fig. 5B).These modules showed that the hub genes were mainly associated with CDK, CDC, and MCM family members, and drugs including doxorubicin, rapamycin, etoposide, paclitaxel, amsacrine, levofloxacin, alisertib, MLN8054, aminopurvalano, dexrazoxane, MgATP, and MgADP.

High expression levels of CDC6 and its clinicopathologic characteristics in ESCC
Through extensive literature review, we found that RFC4, with the highest mutation frequency, has been reported in ESCC, whereas few studies have focused on CDC6 in ESCC.To determine whether CDC6 plays a crucial role in ESCC, we evaluated the expression levels of CDC6 in ESCC cell lines.Both qPCR and western blotting showed that, compared to that in human esophageal epithelial cells, CDC6 was upregulated in ESCC cell lines (Fig. 6A,B).Furthermore, to detect the expression of CDC6 in ESCC samples, we performed immunohistochemical staining.
Of the 35 samples, 26 exhibited high CDC6 expression levels, while the remaining samples showed low CDC6 expression (Fig. 6C,D).Statistical analysis of the H-score showed that, compared to normal esophageal tissues, CDC6 was upregulated in ESCC tissues (Fig. 6E).Therefore, we aimed to determine the relationship between CDC6 and the clinicopathological characteristics of ESCC using CDC6 H-scores (Supplementary Material 4).We selected five terms: age, grade, tumor size, lymph node metastasis, and stage.We found that the gene expression level of CDC6 was related to tumor size, age, lymph node metastasis, and stage, whereas it was not associated with tumor grade (Fig. 6F,G).

Discussion
Although many studies have focused on ESCC tumorigenesis 17 , the incidence of ESCC remains high and its prognosis poor.Therefore, in this study, using expression data obtained from four datasets, we aimed to identify the critical genes and biological pathways involved in ESCC tumorgenesis.We identified 490 DEGs, and GO enrichment analysis showed that the identified DEGs were mainly enriched in the mitotic cell cycle phase transition, DNA replication origin binding, and biological pathways primarily enriched in the cell cycle, DNA replication, and protein digestion and absorption, indicating that the DEGs may be associated with the progression of ESCC.www.nature.com/scientificreports/ We constructed a PPI network and selected the top three valuable modules.The biological pathway analyses of Module 1 revealed that the modules were mainly related to cell cycle and DNA replication, indicating that the genes may participate in the regulation of ESCC proliferation.Deregulation in proliferation remains is key to tumor development 18 .We found that part of Module 1 is related to the cell cycle and DNA replication, and previously, researchers have found that RFC4 is associated with increased DNA copy number alterations in ESCC 19 .CDK1, CCNB1, and CCNB2 are related to the cell cycle and are altered in ESCC 20 .Matrix metalloproteinases (MMPs) and collagen family members constitute the majority of Module 2. MMPs are involved in tumor invasion and metastasis.For instance, MMP13 participates in the proliferation and invasion of ESCC 21 .MMP9 can degrade Collagen IV in the basement membrane, which could promote invasion 22 .The ectopic expression of the collagen family is associated with the prognosis of some cancers [23][24][25][26] .The collagen family is associated with invasion and migration, which participate in epithelial to mesenchymal transition (EMT) and extracellular matrix remodeling 27 .Several genes of the collagen family have been identified as potential diagnostic and prognostic biomarkers for ESCC; however, the underlying molecular mechanisms need to be further expored 28,29 .The PPI network of Module 3 was mainly connected to the chemokine signaling pathway, cytokine-cytokine receptor interaction, and the IL-17 signaling pathway.Chemokine signaling pathways cause differences in the proliferation, angiogenesis, EMT, and metastasis of various cancers.CXCL 1 6/CXCR6 chemokine signaling mediates breast cancer 30 .IL-8 and CXCR-1 are involved in the EMT in colon carcinoma 31 .GROalpha-CXCR2 and GRObeta-CXCR2 signaling play crucial roles in ESCC cell proliferation 32 .
Based on the degree of connectivity, we considered the top 10 genes as hub genes.Both the heatmap and the results from the GEIPA database showed that the expression levels of the genes were significantly higher in tumor samples than in normal tissues, indicating that the genes may play key a role in the development of ESCC.The biological pathways of the 10 hub genes were mainly associated with cell cycle transition, indicating that the genes may affect the proliferation of tumor cells.Cyclin B1(CCNB1) silencing inhibited cell proliferation and facilitated senescence in pancreatic cancer 33 .CCNB1 is involved in the pathogenesis of esophagus carcinoma, and the CCNB1 upregulation is associated with a poor prognosis in patients with ESCC 34 .PCNA serves as a moving platform that allows DNA-and chromatin-interacting proteins to operate at the fork in a DNA sequenceindependent manner, and targeting PCNA-1 can inhibit the proliferation of lung cancer cells 28,35,36 .Thus, the functions of the genes correspond to their respective biological pathways.Additionally, we constructed an interactive network including the 10 hub genes and the top 60 frequently altered neighboring genes in ESCC.Several significant genes, such as TP53, EGFR, and PARP1 37,38 had direct or indirect connections to the 10 hub genes.These interactions indicate that hub genes may participate in ESCC tumorigenesis.The molecular mechanisms underlying these correlations warrant further study.
Using the hub genes, we searched for candidate drugs and 35 drugs with therapeutic efficacy against ESCC were identified.Four of the 10 hub genes, CDK1, TOP2A, AURKA, and PCNA, may be the potential drug targets.Notably, TOP2A is a promising target for ESCC 20 .In addition to doxorubicin, rapamycin, paclitaxel, and etoposide, we identified new drugs, such as levofloxacin, dexrazoxane, and amsacrine, that have not been used in ESCC.Further studies and clinical trials are required to explore their effects in ESCC.Most of the hub genes interacted with MgATP, MgADP, and phosphate.Intracellular ATP is critical for chemoresistance in colon cancer cells 39 and the effect of drug-gene interactions on ESCC chemoresistance requires further research.
Alterations in hub genes included missense mutations, truncating mutations, amplification, and deep deletions.RFC4, with the highest mutation frequency, was upregulated in the early stage and was associated with early nodal metastases and tumor immunity in ESCC 19 .CDC6, with the second highest alteration frequency, has rarely been reported in ESCC.CDC6 forms part of the pre-replication complex that controls DNA replication licensing in the cell cycle 40 .Previous studies have shown that CDC6 was overexpressed in other tumors [41][42][43] .The expression level of CDC6 in ovarian cancer tissues was significantly higher than that in adjacent tissues and it was related to tumor stage, differentiation degree, lymph node metastasis, ascites and prognosis, which was an independent factor of ovarian cancer patients 44 .Mahadevappa et al. suggested that CDC6 expression was significantly increased in breast cancer tissues and correlated with poor prognosis of patients 45 .Further studies showed that down-regulation of CDC6 expression in bladder cancer could significantly inhibit a variety of malignant phenotypes of tumor cells 46 .After knocking out CDC6, the proliferation of tongue squamous cell carcinoma cells was inhibited 47 .
Additionally, CDC6 upregulation represses E-cadherin correlates with EMT 48 .CDC6 is also involved in the progression of ESCC induced by RFBP-and circular RNA circNELL2 49,50 .Consistent with the results from the bioinformatics analysis, CDC6 was upregulated in the ESCC cell lines.Furthermore, we explored the clinical significance of the findings by analyzing the CDC6 H-scores.The tumor samples had higher H-scores than normal samples.CDC6 expression was associated with tumor size, lymph node metastasis, and disease stage, indicating that CDC6 may promote the proliferation and invasion of tumor cells and serves as a novel ESCC therapeutic target.This study has some limitations.First, the number of clinical samples used to investigate the expression level of CDC6 was relatively small.Additionally, the effect of CDC6 on the prognosis of patients with ESCC remains unclear.The molecular mechanisms and biological effects of CDC6 in ESCC have not been fully elucidated.

Conclusions
Using comprehensive bioinformatics analysis and in vitro assays, we demonstrated that the CDC6 gene plays a key role in the progression of ESCC and serves as a novel potential biomarker and therapeutic target for ESCC.www.nature.com/scientificreports/

Data collection and DEGs extraction
Gene expression profiles of ESCC and adjacent normal tissues (GSE23400, GSE38129, GSE20347, and GSE29001) were downloaded from the GEO database (http:// www.ncbi.nlm.nih.gov/ geo/).The GSE23400 dataset consisted of 53 pairs of tumor tissues and matched noncancerous samples.GSE38129 consisted of 30 pairs of tumor and normal tissue samples.GSE20347 consisted of 17 pairs of tumor and normal esophageal tissue samples.GSE29001 consisted of 21 tumor samples and 24 normal tissues.DEGs were identified using R, and the significance procedures were as follows: the Affy package was used to perform background corrections and normalize the data, and then the Limma package was used to perform differential expression analysis.p < 0.05 and logFC (fold change) > 1 were set as cut-off criteria.Volcano plots for each dataset was drawn in R using the ggplot2 package, and the overlapping DEGs among the four microarrays were visualized using the Venn diagram tool (https:// bioin fogp.cnb.csic.es/ tools/ venny/).

Functional enrichment analyses
GO enrichment and KEGG pathway analyses were performed using the DAVID (available online: http:// david.ncifc rf.gov).The GO results of crucial terms for cellular component (CC), biological process (BP), and molecular function (MF) were obtained by importing the DEGs into DAVID, and p < 0.01 was considered statistically significant.

PPI network construction and analysis
PPI network comprising DEGs and hub genes were constructed using Metascape online analyses (https:// metas cape.org/ gp/ index.html#/ main/ step1), which could predict the interactions of proteins, and a combined score > 0.4 was considered statistically significant.Molecular interaction networks were constructed using Cytoscape (version 3.1.2;https:// cytos cape.org/ relea se_ notes_3_ 2_1.html).The three most significant modules were identified by Molecular Complex Detection (MCOD, plug-in in Cytoscape software), which had MCODE scores > 3, false degree cut-off = 2, node score cut-off = 0.2, maximum depth = 100, and false k-score = 2. Functional enrichment analysis of each module was conducted using Metascape.

Identification, analysis, and validation of hub genes
The top ten genes with the highest degrees of connectivity were selected as hub genes.The hub genes within the four ESCC databases were visualized using cluster heatmaps drawn using GraphPad Prism heatmap (version 6.0; http:// uone-tech.cn/ graph pad-prism.html).The biological pathway and co-expression analyses of the ten hub genes was conducted using STRING (https:// string-db.org/).We validated the expression levels and connections of the ten hub genes in ESCC tissues and matched normal tissues using GEPIA (http:// gepia.cancer-pku.cn/ index.html), an online database for analyzing gene expression profiles of tumors.The genetic alterations of the ten selected hub genes in ESCC were determined using cancer genomics prolifers obtained from the cBioCancer Genomic Portal (http:// www.cbiop ortal.org/), which contains a large number of cancer genomics datasets.

Analysis of drug-hub gene interactions
The 10 hub genes served as the promising targets for searching for candidate drugs through the DGIdb (http:// dgidb.genome.wustl.edu/), drugs from more than one database, or PubMed references that are approved by the FDA.The target network of the hub genes was constructed using STITCH (http:// stitch.embl.de/).

Western blot analysis
Cells were lysed using a RIPA lysis buffer (Beyotime, China), supplemented with 1 mM phenylmethylsulfonyl fluoride (Beyotime, China) and 1 mM phosphatase inhibitor (MCE, USA).Protein concentrations were quantified using an Enhanced BCA Protein Assay Kit (Beyotime,China), according to the manufacturer's instructions.Protein samples were added to SDS loading buffer and boiled at 95 °C for 10 min.The denatured protein samples were electrophoresed on a 10% SDS-PAGE gel (Biosharp, China) and transferred onto a polyvinylidene fluoride (PVDF) membrane (Merck Millipore, Germany) using the wet transfer method (we cut the SDS-PAGE gels according to the position of target protein before the protein samples were transferred onto PVDF membranes).

Figure 1 .
Figure 1.Identification of common DEGs, as well as GO and KEGG pathway enrichment analyses of identified DEGs.(A) Volcano plot of detectable genome-wide mRNA profiles in four independent GEO datasets, Red indicates DGEs with p < 0.05 and logFC (fold change) > 1. (B) Different color areas represented different datasets (blue oval represents GSE23400; yellow represents GSE38129; green represents GSE20347; pink represents GSE29001).Overlapping area in Venn diagrams represents overlapping DEGs among the four datasets.(C) GO analysis of DEGs in ESCC, including biological process, cell component, and molecular function.(D) KEGG pathway enrichment analysis of upregulated DEGs.DEGs, differentially expressed genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 2 .
Figure 2. Significant modules and enrichment analyses of DEGs.(A) Top three modules from PPI network of DEGs.(B) Results of KEGG pathway analysis of DEGs in these three modules.

Figure 3 .
Figure 3. Comprehensive analysis of hub genes.(A) Hierarchical clustering heatmap of hub genes from four databases.(B).PPI network of the top 10 hub genes through the STRING database.(C) Biological pathway for the top 10 hub genes.(D).Network of the top 10 hub genes and the other 50 frequently altered genes; red nodes represent hub genes and green nodes represent the co-expression genes.(E).Results of co-expression analysis of the top 10 hub genes.

Figure 4 .
Figure 4. Validation of the top 10 gene expression levels based on the GEPIA.(A) Expression levels of top 10 hub genes in human ESCC.Gray and red boxes represent normal and tumor tissues, respectively.The results were consistent with the preceding consequence of this study.(B) Summary of genetic alternation of the top 10 hub genes in ESCC.*p < 0.05.

Figure 5 .
Figure 5. (A) ROC curves for ESCC diagnosis according to the AUC, red curve represents the GSE23400 dataset; green the GSE38129 dataset; blue the GSE20347 dataset; and purple the GSE29001 dataset.(B) Networks of drug-hub gene interactions.ROC, Receiver operator characteristic and AUC, area under the curve.

Table 1 .
. 348 upregulated and 146 downregulated DEGs were identified from the four GEO datasets of ESCC.

Table 2 .
Top ten hub genes identified by MCODE scores with Degree method.

Table 4 .
FDA approved drugs targeting hub genes of ESCC.